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Abstract — Dempster-Shafer theory of imprecise probabilities 
has proved useful to incorporate both nonspecificity and conflict 
uncertainties in an inference mechanism. The traditional Bayesian 
approach cannot differentiate between the two, and is unable to 
handle non-specific, ambiguous, and conflicting information with- 
out making strong assumptions. This paper presents a generaliza- 
tion of a recent Bayesian-based method of quantifying information 
flow in Dempster-Shafer theory. The generalization concretely 
enhances the original method removing all its weaknesses that 
are highlighted in this paper. In so many words, our generalized 
method can handle any number of secret inputs to a program, it 
enables the capturing of an attacker's beliefs in all kinds of sets 
(singleton or not), and it supports a new and precise quantitative 
information flow measure whose reported flow results are plausible 
in that they are bounded by the size of a program's secret input, 
and can be easily associated with the exhaustive search effort 
needed to uncover a program's secret information, unlike the 
results reported by the original metric. 

Index Terms — computer security, quantitative information flow, 
imprecise probabilities, Dempster-Shafer theory, information the- 
ory, uncertainty, inference, program analysis 



I. Introduction 

The goal of information flow analysis is to enforce limits 
on the use of information that apply to all computations that 
involve that information. For instance, a confidentiality property 
requires that a program with secret inputs should not leak 
those inputs into its public outputs. Qualitative information flow 
properties, such as non-interference are expensive, impossible, 
or rarely satisfied by real programs: generally some flow exists, 
and many systems remain secure provided that the amount 
of flow is sufficiently small, moreover, designers wish to 
distinguish acceptable from unacceptable flows. 

Systems often reveal a summary of secret information they 
store. The summary contains fewer bits and provides a limit 
on the attacker's inference. For instance, a patient's report is 
released with the disease name covered by a black rectangle. 
However, it is not easy to precisely determine how much 
information exists in the summary. For instance, if the font 
size is uniform on the patient's report, the width of the black 
rectangle might determine the length of the disease name. 
Quantitative information flow (QIF) analysis is an approach that 
establishes bounds on information that is leaked by a program. 
In QIF, confidentiality properties are also expressed, but as 
limits on the number of bits that might be revealed from a 
program's execution. A violation is declared if the number of 
leaked bits exceeds the policy. 



The metric in JT| is based on a new perspective for QIF 
analysis. The fundamental idea is to model an attacker's belief 
about a program's secret input as a probability distribution over 
high states. This belief is then revised, using Bayesian updating 
techniques, as the attacker interacts with a program's execution. 
It is believed that the work reported in [1 ] is the first to address 
an attacker's belief in quantifying information flow. This work 
was later expanded and appeared in A number of relevant 
results 0, JU were reported in the sequel; however, the work 
in (D, CD is sufficient as a foundation of our work. 

A number of weaknesses can be seen in (2). First, proba- 
bility measures are used for capturing an attacker's belief and 
representing her uncertainty about the true state of a system. 
These measures have the finite additivity property that forces 
them to act on singleton sets, and makes it difficult to represent 
an attacker's ignorance or contradiction. Moreover, these mea- 
sures cannot model attackers who effectually or ineffectually 
collaborate with each other. Second, the experiment protocol 
between an attacker and a system described in cannot 
handle more than one secret input to a program. Third, the QIF 
metric advanced in [2| reports counter-intuitive flow quantities 
that exceed the size of a program's secret input, and make 
it impossible to determine the space of the exhaustive search 
needed to uncover a program's secret information. 

This paper presents a generalization of the method followed 
in (2) that is free of all these weaknesses. The generalization 
is based on Dempster-Shafer theory of imprecise probabilities 
0, [6 1 which enables the capturing of an attacker's beliefs in 
all kinds of sets (singleton or not), combining those beliefs, and 
revising them to update an attacker's knowledge about a system. 
As part of this generalization, we propose an inference scheme 
an attacker uses to update her knowledge from interacting with 
a program execution. This scheme can handle any number 
of secret inputs to a program. The mathematical toolbox on 
beliefs and the inference scheme we posit in this paper support 
a new and precise QIF measure whose reported flow results 
are bounded by the size of a program's secret input, and can 
be easily associated with the exhaustive search effort needed 
to uncover a program's secret information, unlike the results 
reported by the original metric. 

A. Relation to Our Earlier Work 

In a recent position paper Q, we tackled the inexplicable 
results reported by the QIF metric in [2 1 that exceed the size 
of a program's secret input, and presented a refinement that 



bounds those results by a range consistent with the size of a 
program's secret input. The refinement was accomplished under 
the original Bayesian settings, and it enabled us to relate the 
reported flow results to the exhaustive search effort needed to 
uncover a program's secret information. A reader, interested in 
developing a clear picture of the problems the metric in ||2] is 
fraught with, is strongly referred to [0. 

B. Plan of the Paper 

The remainder of this paper is organized as follows. Section 
HI1 discusses the methods of representing uncertainty starting 
from the coarse-grained frame of discernment, moving to joint 
frames, tuples, and tuple sets, and ending with the fine-grained 
belief functions. In this section, we rigorously clarify the 
limitations of probability measures used in 0. Section [Hi] 
concentrates on capturing beliefs using mass functions and the 
transformation of these functions into belief functions. Our 
mathematical toolbox on beliefs is given in Section |IV] It 
includes formulas for combining beliefs, conditioning them, 
and measuring the divergence between them. In this section, 
we give a clear comparison between the poor properties of 
Kullback-Leibler divergence measure [8| (the authors' choice 
in 12)X an d the appealing ones of Jensen-Shannon divergence 
measure |9| (our choice). We further investigate and succeed in 
generalizing Jensen-Shannon divergence measure in Dempster- 
Shafer theory. Section [V] presents the language needed in our 
experiments. Section [Vl] lifts the syntax and semantics of this 
language in order to enable us to write programs source code 
in terms of mass functions. Section IVIII gives the attacker's 
model and then presents an inference scheme an attacker uses 
to update her knowledge from interacting with a program 
execution. Section IVIIII experiments with this inference scheme 
using various set structures induced by an attacker's beliefs. Our 
informal reasoning and generic observations about experiments' 
results are also given in this section. Section |IX] deals with 
quantifying information flow and advances a new and precise 
QIF measure whose reported flow results are proved to be 
bounded by the size of a program's secret input, and easily 
associated with the exhaustive search effort needed to uncover 
a program's secret information. Sample flow calculations are 
also given in this section. The paper concludes in Section |X] 

C. Novel Contributions 

We believe that the work reported herein is the first to address 
the use of Dempster-Shafer theory in quantifying information 
flow. A number of novel contributions that, to the best of our 
knowledge do not appear in the literature, are also seen over 
the course of this correspondence. They are the generalization 
of Jensen-Shannon divergence measure in Dempster-Shafer 
theory, the rules of updating a mass function, and conditioning 
it on a Boolean expression, in addition to the lifted imperative 
while-language that acts on mass functions. All the uncertainty 
computations that appear in this paper are worked out using the 
pyuds library flO); a Python library we developed specifically 
for this purpose. 



II. Representing Uncertainty 

A. Frame of Discernment 

For most representations of uncertainty, the starting point 
is a set of possible worlds, states, or elementary outcomes 
that an agent considers possible. This set is called a frame 
of discernment ifTTl (a frame for short). For example, in the 
crude guessing of commonly used passwords, an agent might 
consider the following set possible: 

{password, 123456, qwerty, abcl23, letmein, monkey, 696969} 

The frames dealt with in this paper are given under 
the closed-world assumption Q. For a finite frame W = 
{wi, ...,w n }, this means two things: 

1) Exclusiveness: The worlds Wi in W are mutually exclu- 
sive which means that at most one of them is the true 
world. 

2) Exhaustiveness: The frame W is complete which means 
that it contains all the possible worlds. 

A state in a program execution is an assignment of a value 
to a variable, and a frame is eligible to contain a set of those 
assignments. For instance, a Boolean variable a accepts two 
possible assignments a — >• or a — > 1. It has two possible 
states that we may write as a = (a — >• 0) and a = (a —> 1), 
and its corresponding frame is W a — {0, 1}. 

B. Joint Frame, Tuple, and Tuple Set 

A program execution may accept a number of secret (high) 
and nonsecret (low) inputs. For each input, we have a number 
of possible states that we should assimilate into an independent 
frame. To represent an agent's uncertainty about these two types 
of inputs, we need to define the notions of joint frame, tuple, 
and tuple set lfl2ll . 

Definition 1 (Joint Frame): Let r be a finite universal vari- 
able set where for each variable X £ r there exists a frame 
Wx of values that can be assigned to X, and let s C r be a 
variable set. The joint frame on s is defined by the formula: 

W s = Y[W X 

xes 

Definition 2 (Tuple): Let W s be a joint frame on s C r. An 
s-tuple is a function of the form x : s — > W s that associates a 
value x(X) £ W s with each variable X £ s. 

Definition 3 (Tuple Set): Let W s be a joint frame on s C r. 
An s-tuple set is a subset S C W s . 

Definitions [T][3] allow us to assume two joint frames; a high 
joint frame Wh on a high variable set h C r, and a low joint 
frame Wz on a low variable set I C r, to represent an agent's 
uncertainty about secret and nonsecret inputs respectively. The 
overall joint frame Whul ° n the overall variable set h U I C r 
emerges as the product of these two frames: 

m = n w x ,wi = n^x.^ui = n w * 

xeh xei X£hui 

In the remainder of this correspondence, a frame is always 
joint unless we state otherwise. When we refer to a frame, we 



write W s , however we do not say that it is taken on the variable 
set s C r. In addition, states are handled similarly to tuples, 
and likewise state sets to tuple sets. When we say the high and 
low projections of a state, we mean the projections of that state 
to h and I respectively. 

C. Belief Functions 

A frame is a coarse-grained representation of uncertainty, 
since we do not have any means of comparing the likelihood 
of two worlds. Belief functions, the cornerstones of Dempster- 
Shafer theory 0, (6), offer a fine-grained representation of 
uncertainty that is suitable for our work because they are 
numeric thus enabling us to quantitatively measure information 
flow. They further permit the modeling of the evolution (or 
regression) of an agent's knowledge about a system as more and 
more pieces of evidence become available. Additionally, they 
admit a programming language semantics, as we will show in 
Section [VI] Finally, under belief functions, all pairs of worlds 
are comparable thus promoting the reasoning of agents and 
empowering our analysis. 

Although probability measures, the authors' choice in (2), are 
familiar, quantitative, support operations on beliefs, and admit a 
programming language semantics, they have the finite additivity 
property that forces them to act on singleton sets. This makes it 
difficult to represent ignorance (by assigning a zero probability 
to a set in an algebra) and contradiction (by assigning a nonzero 
probability to the empty set). It also complicates assigning 
probabilities to non-singleton and joint sets. The inability of 
agents to capture ignorance, express contradiction, and believe 
in non-singleton and joint sets clearly detracts from the depth of 
our analysis. In addition, probability measures entail assigning 
scalar probabilities to all sets in an algebra, but an agent 
may not have sufficient computational power to do that. This 
computational inefficiency escalates into a grueling ordeal when 
dealing with huge frames. Lastly, probability measures can only 
capture independent work, while failing at modeling attackers 
who effectually or ineffectually collaborate with each other as 
rigorously clarified in Example Q] 

Example 1 (Modeling Attackers' Collaboration): Consider 
a band of attackers whose purpose is to hack into a computer 
system. Assume that this band is partitioned into sub-bands 
Ai, A2,..., A n and let /Lt(Aj) be the degree of infiltration 
begotten by the sub-band Ai. For any two sub-bands Ai and 
Aj, it is intuitive that any of the following can happen: 

• p(Ai U Aj) = p(Ai) + p{Aj) when Ai and Aj work 
independently. 

• p(Aill Aj) > p{Ai) + fi{Aj) when Ai and Aj effectually 
collaborate. 

• p,(AiUAj) < fi(Ai)+n(Aj) when Ai and Aj ineffectually 
collaborate. 

III. Capturing Belief 

A belief is a psychological state in which an agent has a 
degree of support to a proposition about a system. A belief is 
based on a piece of evidence an agent obtains through some 



mean. In the framework of Dempster-Shafer theory, this belief 
is captured using a mass function, which is defined as follows. 

Definition 4 (Mass Function): Let W s be a frame. A mass 
function on W s is a function of the form m : P(W S ) — > [0, 1] 
where V{W S ) is the first-order power set of W s defined as 
V(W S ) = {X\X C W s }. This function satisfies: 

m(0) = 0, m ( A ) = 1 

Aev(w s ) 

For any A G P(W S ), the value m(A) has the following 
meaning; it characterizes the degree of belief that the true world 
is in the tuple set A, but it does not take into account any 
additional evidence for the various subsets of A. 

Each tuple set X 6 V(W S ) such that m(X) > is called a 
focal set of m. We denote the set of all focal sets induced by 
m as T m , and write: 

T m = {X £ T(W s )\m{X) > 0} 

We call the pair (.F TO ,m) a body of evidence. Occasionally, 
we denote the domain 7 , (W S ) of m as d(m). Definition[5] shows 
how to project a mass function. 

Definition 5 (Mass Function Projection): Let W s be a 
frame, m : V(W S ) — > [0, 1] be a mass function on W s , and 
t C s be a variable set. The projection of m to t is defined for 
any A G V(W t ) by the formula: 

m ^(A) = m ( B ) 

B±*=A 

where is the projection of the tuple set B G V(yV s ) to t. 

As a specialization of the general mass function, we define 
a point mass function as follows. 

Definition 6 (Point Mass Function): Let W s be a frame, and 
m : V(W S ) — > [0, 1] be a mass function on W s . We say that 
m is a point on the tuple set A G V(W S ), and write tHa, if 
the degree of belief characterized by m is fully concentrated 
on A, that is, if m(A) = 1. 

Since it does not have the finite additivity property, a mass 
function m is not a measure. This can be coped with. One 
can bind the pieces of evidence together, and obtain a belief 
measure from m using the formula: 

Bel(A) = ^m(B) 

BCA 

Since the tuple sets in the domain of the function Bel : 
P(W S ) — > [0, 1] are measurable, normalizing the values 
Bel (A), so that the sum is 1, allows us to apply the familiar 
distribution arithmetic on them i.e., distribution sum, product, 
conditioning, and difference [ 1 3 1 . However, this is not what 
we want to do. Converting the values m(A) to Bel(A) is 
an expensive operation that should be kept to a minimum. 
Moreover, dealing with the values m(A) is more tractable 
than dealing with Bel(A). Thus, we ought to maintain the 
mass function setting in our work and propose the following 
arithmetic on beliefs. 



IV. Arithmetic on Beliefs 

A. Belief Combination 

We combine beliefs using Dempster's combination rule Ifl4l . 
Given two pieces of evidence obtained from two independent 
sources (we will shortly discuss independence) and expressed 
by two mass functions mi and m 2 on the same frame W s , 
Dempster's combination rule aggregates m\ and m 2 to obtain 
a combined mass function m\ ® m 2 which is defined for any 
tuple set ^ A 6 P(W sUs ) by the formula: 

(mi ®m 2 )(A) = k. mi(S).m 2 (C7) (1) 

BnC=A 

where: 

(mi <g>m 2 )(0) = 0,fc _1 = Y mt(B).m 2 {C) 

BC\C=£$ 

If mi and m 2 are defined on two different frames W s and 
Wt , then the intersection B n C is inapplicable anymore and 
is replaced with the natural join operation B M C lfl2l as 
expressed by the formula, which is defined for any tuple set 

% + Ae V{W sU t): 

(mi <g> m 2 )(A) = fc. ^ mi(B).m 2 ((7) (2) 

where: 

(mi ® m 2 )(0) = 0,fc _1 = mi(B).m 2 (C7) 

BixC#0 

The parameter fc in formulas (fl~|i and (O normalizes mi<gim 2 
which has the appeal of explicitly recognizing conflict between 
the pieces of evidence an agent gathers about a system |fl5l . 

A prerequisite for using Dempster's combination rule is that 
the pieces of evidence are obtained from independent sources. 
Intuitively, this means that these pieces are totally unrelated 
and that the occurrence of one of them has no influence on the 
other [IT). In our work, this is well-justified if the pieces of 
evidence are obtained from external sources that are unrelated 
to a program execution; however, it is not if the pieces are 
obtained by monitoring an execution - in repeated executions, 
an agent relies on one output to rearrange the next input and 
thus influence the next output Q. 

Dempster's combination rule has the distinguishing property 
of being commutative and associative ifTTl . This empowers our 
analysis by allowing an agent to choose the combination order 
and postpone the combination of a misleading piece of evidence 
until more hints about this piece are available. 

B. Belief Conditioning 

We condition beliefs using Dempster's conditioning rule 
fl4l . Suppose that a current agent's belief is captured using 
a mass function m : V(W S ) — > [0,1]. Later on, this agent 
obtains a new piece of evidence that the true world is in the 
tuple set B 6 P(W S ). Suppose further that there exists a focal 
set C G T m such that C fl B ^ 0. Dempster's conditioning 
rule enables the agent to incorporate the new evidence and 
update her knowledge. This rule transforms m into a new mass 



function ms as expressed by the formula, which is defined for 
any tuple set ^ A e V(W S ): 

\k. J2 rn(C) for A ^0 
m B {A) = { cnB=A (3) 
[0 forA = 

where: 

The parameter k has the effect of normalizing ma (A), and 
enjoys the same quality mentioned in the previous section. 

C. Belief Divergence 

1 ) Choosing a Divergence Measure: An agent's belief about 
a program's secret input is modeled as a probability distribution 
in [2 1, and the divergence between two probability distributions 
is measured using Kullback-Leibler divergence OD, which is 
given in Definition [7] 

Definition 7 (Kullback-Leibler Divergence Measure): Let 
X be a discrete random variable with alphabet X, and let pi 
and p2 be two probability distribution functions on X. The 
Kullback-Leibler divergence measure between p\ and j> 2 is 
defined by the formula: 

KL(p uP2 ) = Vpi(x)log^y4 

p 2 (x) 

Our work necessitates a divergence measure between mass 
functions, not between probability distributions. KL divergence 
cannot be written in terms of generalizable uncertainty func- 
tional, and thus seems non-generalizable in Dempster-Shafer 
theory to act on mass functions. In contrast, Jensen-Shannon 
divergence measure [9| has an obvious information-theoretic 
interpretation in terms of Shannon uncertainty functional, which 
makes it generalizable in Dempster-Shafer theory, in addition 
to a number of desirable properties that KL lacks. Before 
defining Jensen-Shannon divergence measure, we need to give 
a definition for Shannon uncertainty functional. 

Definition 8 (Shannon Uncertainty Functional): Let X be a 
discrete random variable with alphabet X, and let p be a 
probability distribution function on X. The uncertainty about 
X is defined by the functional: 

Sip) = - ^2p( x )^ogp(x) 

Uncertainty is measured in bits if the logarithm is binary. 
(Here and hereafter, all logarithms are to the base 2). 

Definition 9 (Jensen-Shannon Divergence Measure): Let p\ 
and p2 be two probability distribution functions. The Jensen- 
Shannon divergence measure between p\ and p 2 is defined by 
the formula: 

JS( Pl , P2 ) = 2S(^^) - S(pi) - S{ P2 ) 

In Table U we compare between KL and JS divergence 
measures. P3 is a salient property that maintains the balance 
and computational correctness in the information flow measure 



TABLE I 

Comparison between KL and JS divergence measures 



No Property KL JS 



PI D(p 1 ,p 2 ) > Yes Yes 

iff pi(x) ^ p 2 (x) 

P2 D(pi,p 2 )=0 Yes Yes 

iff pi(x) = p 2 (x) 

P3 D( Pl ,p 2 ) = D(p 2 ,pi) No Yes 

P4 Finiteness (Definement) Not if we have p log || Yes 

P5 Upper and lower bounds No, only lower bound Yes 

P6 Boundness No Yes, JS < 2 



we will advance in Section|IX] P4 is important in its own right, 
since it enables us to handle all possible belief combinations, 
including those where one belief is zero and the other is 
positive. The dissatisfaction of P4 in KL drives the authors 
of |fl] to suggest an admissibility restriction on beliefs whose 
ineffectiveness is revealed in our earlier work [0. We also see 
that P6 is appealing to have in our work. Indeed, it decidedly 
contributes to the desirable boundness of the flow measure we 
will propose in Section ITXl 

2) Generalizing the Divergence Measure: As we saw in Def- 
inition [9] JS is written in terms of S. Therefore, generalizing 
JS in Dempster-Shafer theory entails generalizing S in the 
same theory. The hunt for a generalization of S in Dempster- 
Shafer theory starts by noticing that two types of uncertainty 
coexist in this theory: 

1) The nonspecificity in our prediction about the true world 
in a frame. 

2) The conflict between the pieces of evidence expressed by 
each mass value. 

To measure nonspecificity in Dempster-Shafer theory, we use 
generalized Hartley uncertainty functional [15 1, which is given 
in Definition [TOl 

Definition 10: (Generalized Hartley Uncertainty Functional): 
Let m : V(W S ) —> [0, 1] be a mass function on W s , and T m 
be the set of all focal sets induced by m. The nonspecificity 
uncertainty about the true world in W s is given by the 
functional: 

GH(m) = m(A)log\A\ 

To aggregately measure both nonspecificity and conflict 
in Dempster-Shafer theory, we use the aggregate uncertainty 
functional [15], which is given in Definition QT| 

Definition 11 (Aggregate Uncertainty Functional): Let 
Bel : r(W s ) -> [0, 1] be a belief function on W s - The 
aggregate uncertainty about the true world in W s is given by 
the functional: 

AU(Bel) = max < — p(x) \ogp(x) > 

Bcl I l£W s J 

where Vsei is the set of all probability distribution functions 
that dominate Bel by satisfying the following two properties: 

1) p(x) G [0, 1] for any x G W s and p( x ) = 1 



2) Bel(A) < J2 P(x) for any A G V(W S ) 

A recursive algorithm for computing AU is given in Ap- 
pendix II-AI [15|- It can be shown that AU is insensitive to 
changes in evidence which makes it ill-suited for capturing the 
uncertainty associated with an agent's beliefs 11151 . Therefore, 
AU is not what we need in order to generalize JS in Dempster- 
Shafer theory. However, If we recall that AU is a total of two 
types of uncertainty; nonspecificity and conflict, we can write: 

AU(Bel) = GH(m) + GS(m) 

Based on this equivalence, we can define the generalized 
Shannon uncertainty functional. 

Definition 12: (Generalized Shannon Uncertainty Functional): 
Let m : V(W S ) — > [0,1] be a mass function, and 
Bel : V(W S ) — > [0, 1] be the corresponding belief function, 
both on W s - The conflict uncertainty about the true world in 
W s is given by the functional: 

GS{m) = AU(Bel) - GH{m) 

where GH(m) and AU (Bel) are respectively given in Defini- 
tions [TO] and [IT] 

Notice in Definition [T2] that the insensitivity of AU is 
overcome by subtracting GH from it. This makes GS sensitive 
to changes in evidence, and allows us to proceed with our novel 
generalization of JS in Dempster-Shafer theory. 

Definition 13: (Generalized Jensen-Shannon Divergence Measure): 
Let mi and mi be two mass functions on W s . The generalized 
Jensen-Shannon divergence measure between mi and m-i is 
defined by the formula: 

GJS{m 1: m 2 ) = 2GS{ mi+m ' 1 ) - GS(m x ) - GS(m 2 ) 

where GS is given in Definition [12] 

Now we have to check whether the properties of JS listed 
in Table U hold on GJS. We know that for any m, we have 
GS(m) > 0, which means that PI holds on GJS. P2 and P3 
obviously hold on GJS. It is known that GH(m) < log|W s 
and AU(Bel) < log|W s | for any m and Bel on W s ifBl . 
This means that GS(m) < log|W s | and consequently that 
GJSim^mi) < log \W S \. Thus, P4 and P6 also hold. 

V. Language 

We use an imperative while-language extended with a proba- 
bilistic choice construct. The language is described using rules 
that show how expressions and commands are formed, how 
expressions are evaluated, and how commands are executed. 

A. Syntax 

The syntactic sets and the metavariables that range over 
them are shown in Table [TT] The formation rules of arithmetic 
and Boolean expressions are standard, and we only give the 
formation rules of commands: 

c ::= skip|X := a\co; Ci|if b then Co else Ci|while b do c\cq p [] C\ 

The probabilistic choice rule cq p [] c\ executes cq with a 
probability p or c\ with a probability 1 — p. 



TABLE II 

The syntactic sets and the metavariables 



TABLE III 

The execution rules of commands 



Syntactic Set Metavariables 



Val: The set of integers N n,m 

Bool: The set of truth values {true, false} t 

Var: The set of program variables X,Y 

Aexp: The set of arithmetic expressions a 

Bexp: The set of Boolean expressions ft 

Com: The set of commands c 



[skipjer = Act £ State. a 

[X := a]a = Act £ State. o[X h-» n] where [a]cr = n 
[co;ci]ct = ([ci] o [c ])ct = Act £ State. [ci]([co]ct) 
[if 6 then co else ci]ct = Act £ State. ([b]cr, [cq]ct, [ci]ct) 
[while ft do c]ct = Act £ State, least fixed point of T : State 
— > State where r(ip) = Act £ State. ([b]o, (ip o [c])ct, ct) 
[co p[] ci]ct = Act £ State.p X [cq]ct + (1 — p) X [ci]cr 



B. Semantics 

Recalling that a state in our scheme is an assignment of a 
value to a variable (what we mentioned in Section Ill-Al l, and 
having introduced the syntactic sets in the previous section, we 
can now denote a state as a function of the form a : Var — > 
Val. When we write a(X) — n or a(X — > n) for X G Var 
and n G Val, we mean that the value of the variable X in 
the state a is n. We might have more than one variable in 
a single state, in which case we write <r(X, Y) — (n, m) or 
<t(X —> n, Y — > m) for X, Y G Var and n, m £ Fa/. A 
notation State is also needed to refer to the set of all possible 
states in a program execution. We use the following semantic 
functions: 

A : Aexp -> (State -> Val) 
B : Bexp — > (State Bool) 
C : Com — > (State —> State) 

which enables us to define the following denotation functions: 

Va 6 Aexp.A[a] : State —> Val 
Vb e Bexp.B[b] : State -> Bool 
Vc E Com.C[c] : State — > State 

Since the semantic functions are known, as well as the range 
of metavariables, we condense the denotations and write [a], [6], 
and [c] instead of A[a], B[b], and C[c\. 

The evaluation of arithmetic and Boolean expressions is 
standard. As for commands, we note that their execution 
changes in program states. Unless the corresponding program 
inputs are influenced by an agent, we assume that variables in 
all states are initially set to zero, that is \/X £ Var.ao(X) = 0. 
We also observe that a command execution may terminate in 
a final state, or may diverge and never yield a final state (non- 
termination). Let us explain the meaning of termination in this 
non-lifted semantics. 

Definition 14 (Non-lifted Meaning of Termination): For any 
c e Com, when we write: 

[c]a' = Act G State. a 

we mean that the command c, which began in an input state 
a', deterministically terminates in an output state a. 

The execution rules of commands are given in Table [III] The 
notion given in Definition Q3] is used in one of those rules. 

Definition 15 (State Update): Let a G State, X G Var, 
and n G Val be a state, a variable, and a value respectively. 
The state obtained from a by changing the value of X to n in 



cr is denoted as a[X H> n]. Formally, we write: 

w N in ifY = X 
a[X i v n](Y) = { , s 
1 JV ' \a(Y) ilY^X 

We also make use of the simplifying and colorful notation: 



(b,x,x') 



x if b — true 
x' if b — false 



VI. Lifted Language 

In this section, we lift the language we presented in Section 
M in order to act on mass functions. Our lifted language is 
the first of its kind to enjoy this property. The upgrade process 
involves both the syntax and the semantics. 

A. Lifting the Syntax 

We need to add one more syntactic set to the sets shown in 
Table [II] which is what we do in Definition [161 

Definition 16 (The MASS Syntactic Set): Let Whui be a 
frame on the overall variable set h U I C r that contains a 
program's secret and nonsecret inputs. We define the syntactic 
set MASS to be the set of all mass functions on Whuh and 
we use the metavariables m and m' to range over MASS. 

One more formation rule is also needed for any m G MASS, 
and it is luckily prescribed in Definition |4] 

B. Lifting the Semantics 

Assuming input (output) masses, when we write m(a) = n 
for m G MASS, a G State and n G [0, 1], we mean the 
likelihood that a is to be used as an input (output) state. The 
only semantic function we need to lift is the one pertaining 
to commands. The lifted command semantic and denotation 
functions are defined by the mappings: 

C : Com -> (MASS MASS) 
Vc G Com.C[c] : MASS -> MASS 

The meaning of termination also changes in the lifted se- 
mantics as shown in Definition [171 

Definition 17 (Lifted Meaning of Termination): For any c G 
Com, when we write: 



[c]m! = Am G MASS^jm(a).[c]a 



we mean that the command c, which began in any input state a' 
of d(m'), potentially terminates in any output state a of d(m). 



TABLE IV 

The lifted execution rules of commands 



[skip]m = Am G MASS.ra 

[X := a]m = Am G MASS.m[X h-> n] where [a]cr = n 
for any <r € d(m) 

[co; ci]m = ([ci] o [co])m = Am G M AS S .[ci]([co]m) 
[if 6 then Co else ci]m = Am £ MASS. [co](mjft) + [ci](m|-if>) 
[while 6 do c]m = Am G MASS. least fixed point of F : MASS 
->■ MASS where r(v>) = Am G MASS.ip([c](m|6)) + (m[-.6) 
[co p[] ci]m = Am G MASS.[c ](p X m) + [ci]((l - p) X m) 



The sum value to the right-hand side of the previous formula 
specifies the likelihood of this termination. 

In this context, we also need to give our novel definition of 
a mass update. 

Definition 18 (Mass Update): Let m G MASS, X £ Var, 
and n £ Val be a mass function, a variable, and a value 
respectively. The mass function obtained from m by changing 
the value of X to n in all the states of d(m) is denoted as 
m[X h-» n]. Formally, we achieve that as follows: 

1) Ver G d(rn).a' = a[X i-» n] £ i-)- n]) 

r-,^ ,n I ni(er) if X G a 7 

2) m[X^n](a')= , 

I m(a ) it X f. a 

The lifted execution rules of commands are given in Table 
IIVI These rules immediately follow from applying the formulas 
in definitions \T7\ and [18] to the execution rules given in Table 
Hill Notice in the lifted rules that we are conditioning a mass 
function on a Boolean expression. Formula (01 can not do this. 
We give a novel adaptation of this formula in Definition [T9l 

Definition 19 (Boolean Expression Conditioning): Let m : 
P(W S ) — > [0, 1] be a mass function on W s , and b £ Bexp 
be a Boolean expression. The expansion of b to the domain 
V(W S ) of m yields the tuple set B C W s whose tuples satisfy 
b i.e., B = {x £ W s \x h b}. The conditioning of m on b is 
then given by the formula: 

f ^ m(C) for 4^0 
m b L4) = < cns=A 

[0 for ,4 = 

Notice that the resulted mass function is unnormalized. 

VII. Inference Scheme 

This sections presents an inference scheme an attacker uses 
to update her knowledge from interacting with a program 
execution. This scheme is a generalization of the experiment 
protocol advanced in 0; however it surpasses that protocol 
by handling any number of secret inputs to a program. Before 
describing this scheme, we need to give the attacker's model. 

A. Attacker's Model 

The attacker is modeled via the following assumptions: 

1) The attacker has a copy of the program's source code. 

2) The program has a number of secret inputs the attacker 
does not know and would like to learn. 

3) The program executes on a system that does not inten- 
tionally collude to leak the secret inputs. 



4) The program always terminates and preserves the state 
of secret inputs as high. 

5) The program executes once per interaction with the 
attacker, and in each execution the attacker is allowed 
to make only one observation. 

6) The attacker can monitor the public output of the program 
and adoptively change the input. 

7) The attacker knows the frame of each secret input, and 
the values of all of the nonsecret inputs. 

8) The impossible world is not a true value of any of the 
inputs lfl5ll . Therefore, the attacker's belief is captured via 
a normalized mass function, which assigns a zero degree 
of belief in the impossible world (the empty set) as we 
saw in Definition [4] 

B. Scheme Description 

At first, the attacker has an initial belief about the true values 
of the secret inputs. The extent of this belief is captured using 
an initial mass function mi n i t : "P(Wh) — > [0, 1]. This function 
can either reflect the attacker's initial total ignorance or her 
belief in an initial piece of evidence she obtained through some 
mean. In the former case, the attacker knows that the true values 
of the secret inputs are in the frame Wh', however, she has 
no evidence whatsoever about their location in any subset of 
that frame, which gives mi n i t (Wh) = 1 and mi n i t (A) = 
for any A £ P(Wh)\Wh- In the latter case, the degree of 
the initial belief distributes (unequally in general) among a 
number of sets I\,...,I m £ P(Wh) such that rrii n it(Ii) = i\ > 
0,...,m mlt {I m ) = i m > 0, m mlt (W h ) = 1 - ii - ... - i m , and 
h + ... + i m < L 

Without relying on monitoring a program execution, the 
attacker soon obtains a finite number n of pieces of evidence 
(through social engineering say) from n independent sources 
(independence was discussed in Section HV-Ab about the true 
values of the secret inputs. The extent of these n pieces of evi- 
dence is captured using n mass functions rrii : ViWh) — > [0, 1] 
where i = l,...,n. 

Before experimenting with a program execution, the attacker 
ought to combine the mass functions she has using formula 
(Q~|l. The combination outcome is the attacker's prebelief m pre , 
which describes her belief before interacting with the program: 

n 

m pre : V(Wh) -> [0, 1] : m pre (A) = m mlt (g) ^rrii(A) 

i=i 

The system chooses the high projection of the input state 
a i h £ V(Wh) to be the set that contains the true values of the 
secret inputs. The corresponding point mass function would be: 

rh h : P(Wh) -> [0, 1] : m. h ((r ih ) = 1, m h (A) = for any A G V(Wh)W h 

The attacker chooses the low projection of the input state 
a ii ^ V(Wi) with the corresponding point mass function: 

mi : V(Wi) -> [0, 1] : mi((^ 1 ) = 1, mi(A) = for any A G V{Wi)\a^ 

The low projection a^ 1 represents the attacker's guesses of 
the secret inputs, in addition to the nonsecret inputs. These 



guesses are likely to be influenced by the attacker's prebelief, 
in which case, the attacker would choose a^ 1 as the set that 
has the highest mass according to m pre . However, we do not 
impose an influence as such to avoid the loss of generality. 

The program's input becomes the combination to./, <S> rhi 
done using formula ([2j, since the domains of and mj are 
different. The system executes the program which produces a 
mass function: 

m s : T(W hui ) -> [0, 1] : m 5 (A) = [S](m h ® m t )(A) 

This mass function represents many possible output states. 
However, since the attacker is allowed to make only one obser- 
vation per execution, one state must be chosen randomly. This 
random choice is made using a sampling operator V that draws 
a state a' from the domain of ms with a probability \/\F mij \. 
The chosen output state becomes a' G T(ms), from which the 
attacker observes the low projection o = a ^ G V(Wi). 

The attacker applies the semantics of the program to the 
combination rhi (g> m pre to generate a prediction m s of the 
output mass ms' 

m' s : V(W hui ) -)• [0, 1] : m' 5 (A) = [S](mi ® m pre ){A) 

The attacker incorporates any additional information con- 
tained in the observation o, she made earlier, by conditioning 
m s on o using formula {3). The result is a new mass func- 
tion m 5 the attacker projects to h to obtain her postbelief 
rripost = Trig , which describes her belief after interacting 
with the program. 

It is worth pointing out that in repeated executions, the 
attacker may choose her postbelief from one execution as a 
prebelief to the next. The attacker may even choose a prebelief 
that contradicts the pieces of evidence she has. Both choices 
are acceptable and add ample expressiveness to our analysis. 

VIII. Experimenting with the Inference Scheme 

Unlike the QIF method used in Q, which can only deal 
with singleton focal sets induced by an attacker's beliefs, our 
method is capable of handling all focal set structures. This 
includes, in addition to singleton focal sets, focal sets that form 
a partition, overlapping, and nested focal sets. Experimenting 
with our scheme using singleton sets yields identical results to 
those in |2|. We also find it rather similar to experiment using 
overlapping or nested sets. Therefore, we experiment with only 
partition and nested sets. For the purpose of our experiments, 
we reuse the same password checker from @. This checker 
sets an authentication flag a after checking a stored password 
p against a guessed password g supplied by the user. 

VWC : tfp = g then a := 1 else a := 

The secret input to this VWC is p while the nonsecret ones 
are g and a. The universal variable set is r — {p, g, a} and 
the high and low variable sets are h — {p} and I = {g, a} 
respectively. For simplicity, p is assumed to be either A, B, or 
C. Each conducted experiment involves two runs of interaction 
between the attacker and VWC. The real password is assumed 
to be A in the first run and C in the second. 



TABLE V 

The attacker' s prebelief and postbelief in experiment 1 



V(W h ) 


m P re m poat 


m post 


{A} 


.98 1 





{B,C} 


.02 


1 


TABLE VI 



An intermediate table for computing m h ® mi 



{(A,A,0), (B, AO), (C, AO)} : 1 


{(A A 0), (A A 1), (A 5,0), 


{(A AO)} : 1 


(A B, l), (A O, 0), (A O, 1)} : 1 





A. Experiment 1 

In this experiment, the focal sets induced by m pre form a 
partition as shown in Table |V] Notice that the attacker believes 
p is overwhelmingly likely to be A, but has a very small chance 
(not necessarily equally distributed) to be either B or C. 

1) Interaction 1: The system chooses a^ h = (p — > A) and 
the attacker chooses er-' = (g — > A, a — > 0). The corresponding 
rhn and mi are given in Section IVII-BI The program input 
to/j ® mi is determined by applying formula d2}. We simplify 
this task by performing the intermediate computations shown 
in Table [VT1 [ 12!]. The first column in this table contains rhh 
and the top row contains mi, both of which extended to the 
union variable domain h U I = {p, g, a}. Every internal cell 
contains the intersection between the corresponding tuple sets 
and the product of the corresponding values. The combination 
is finalized by adding the values of all internal cells with equal 
tuple set and normalizing by k = 1 to obtain: 

m h ® to; = [{(A,A,0)} : 1] 

Next the semantics of VWC, given in Table IIVI is applied: 

[PWC](rh h ® to;) = [co]((rhh ® mi)\b) + [ci]((rh h <g> mi)\^b) 
where: 

c ::= a := 1, c\ ::= a := 0, b ::= p = g, -b ::= p ^ g (4) 
The expansion of b to V(Whui) yields: 

B = {(A,A,0),(A,A,1),(B,B,0), 

(fl,fl,l),(C,C,0),(C,C,l)} W 

Applying Definition [191 conditioning gives: 

(rh h ®rrn)\b=[{(A,A,0)} : 1] 
The expansion of —b to V(Whui) yields: 

-.fl = {(A, B, 0), (A, B, 1), (A, C, 0), (A, C, 1), 

(B, A, 0), (B, A, 1), (C, A, 0), (C, A, 1), (6) 
(B,C,0),(B,C,l),(O,B,0),(C,B,l)} 

Applying Definition [191 conditioning again gives: 

(rh h <S>mi)\->b= [0 : 1] 



Now we apply the mass updates, as described in Definition [T8| 

[co]((m h ®mi)\b) = [{(A,A,1)} : 1] 
[c 1 \{(rh h ®m l )\^b) = [0:1] 

A straightforward addition gives: 

[PWC](rh h ® m,) = [{(A, A, 1)} : 1; : 1] 
and a final normalization yields the output mass: 

m 5 = [VWC]{m h g> m,) = [{(A, A, 1)} : 1] 

The only state that can be drawn from d(ms) is (A, A, 1) 
from which the attacker observes the low projection: 

a' = (p ->• A, g ->• A, a ->• 1) 
o = ct'^ = (<? — >■ A, a — ► 1) 

Next rh; ® r/ipre is determined by applying formula (|2): 

mi ® m pre = [{(A, A, 0)} : .98; {(£?, A, 0), (C, A, 0)} : .02] 
The semantics of VWC is now applied to get: 

[VWC)(mi <8>m pre ) = [co]((mj ® m pre )|6) + [ci]((m; ® m pre )|-i6) 

where cq, Ci, 6, and ->6 are the same as in Applying 
Definition [191 conditioning with the same (O and (|6]) yields: 

(m i ®m pT . e )|6= [{(A,A,0)} : .98; 0: .02] 
(m I ®m pre )|-.6= [{(B,A,0),(C,A,0)} : .02; 0: .98] 

The mass updates are now applied to get: 

[c Q }(( mi ®m pre )\b) = [{(A, A, 1)} : .98; : .02] 
[ci]((»hi®OTp re )|-.6) = [{(B,A,0),{C,A,0)} : .02; : .98] 

A straightforward addition gives: 

[PWC](mi®m pre ) = [{(A,A,l)} : .98; 

{(B,A,O),(C,4,O)}:.O2;0:1] 

and a final normalization yields the attacker's prediction: 

m' s = [PWC]{ mi ®m pre ) = [{(A,A,1)} : .98; 

{(B,A,0),(C,A,0)}:.02] 

After expanding o to d(m s ) and obtaining: 

O = {(A,A,1),(B,A,1),(C,A,1)} 
the attacker conditions using formula (01 to get: 

m£ = m' s \o = [{(A, A, I)} : 1] where k = 1/.98 

A final projection of m s to h yields m post shown in Table [V] 
2) Interaction 2: Similar computations to those presented in 
the previous section yields m t , also shown in Table [V] 



TABLE VII 

The attacker' s prebelief and postbelief in experiment 2 



V{W h ) 


TTLpre 


m 'post 


m post 


{A,B} 


.98 








{A,B,C} 


.02 








{A} 





1 





{B} 








.98 


{B,C} 








.02 



3) Reasoning About the Results: If we contemplate the 
results in Table [V] m post suggests that the attacker is certain 
that p is A, whereas m post , suggests that she is certain that 
the p is either B or C (with chances that are not necessarily 
equal). Comparing m post with m pre tells that interaction 1 
had begotten little change in the attacker's belief. This little 
change corresponds to little update in the attacker's knowledge 
and subsequently to little information flow from VWC. If we 
compare m post with m pre , we arrive at the converse conclusion 
- larger knowledge update and larger flow. Notice also that 
m post and m post are more accurate than m pre since both of 
them are nearer to rhh than it. This accuracy increase results 
in informing of the attacker, which is positive information flow. 

B. Experiment 2 

In this experiment, the focal sets induced by m pre are nested 
as shown in Table IVIII Notice that the attacker believes p is 
overwhelmingly likely to be either A or B, but has a very 
small chance to be either A, B, or C (all the chances are not 
necessarily equal). The attacker's postbeliefs m post and m post 
are shown in the same table. 

1) Reasoning About the Results: If we contemplate the 
results in Table IVIII m post suggests that the attacker is certain 
that p is A, whereas m post suggests that she believes p is 
overwhelmingly likely to be B but has a very small chance (not 
necessarily equally distributed) to be either B or C. Comparing 
m post with m pre tells that interaction 1 had begotten large 
change in the attacker's belief (she no longer believes in {B}). 
This large change corresponds to large knowledge update and 
large flow. Comparing m post with m pre yields the converse 
conclusion. Notice also that m post is more accurate than m pre 
since it is nearer to rhh than it. This accuracy increase results 
in informing of the attacker, and means positive information 
flow. However, we cannot informally claim that m post is more 
accurate than m pre - they both seem to stand at nearly the 
same distance from rhh (which is a point mass on {C}). This 
nearly -constant accuracy reflects near-zero information flow. 

C. Generic Observations 

We can derive generic and informal observations by putting 
the experiments' results into a wider perspective. If the attacker 
has a strong belief that the true value of a secret input is in a 
partition (in a set nested in other sets in the body of evidence), 
and an interaction with the system confutes her belief, then 
the attacker's strong belief is transferred to that partition's 
complement (those sets' intersection). 



IX. Measuring Information Flow 

The approach used in J2) to measure information flow, 
which corresponds it to an improvement in the accuracy of 
an attacker's belief, is applicable in our setting. Recall from 
Section TIV-C2I that GJS(m\,m>}) measures the divergence 
between mi and rri2. The accuracy of the attacker's prebelief 
m pre is its distance from rhu, measured as GJS{m pre ,rhh). 
Likewise, the accuracy of the attacker's postbelief m pos t is 
GJS(m p0 st,Thh)- We define the amount of information flow 
Q as the difference between these two quantities: 

Q = GJS(m pre ,m h ) - GJ S(m post ,rn h ) 
= 2GS( mpr '+™ h ) - 2GS( mp ° at 2 +7hh ) 
- GS(m pre ) + GS(m post ) 

Calculating the amount of flow from the experiments con- 
ducted in Section IVlTIl yields .020145, .97999, 1.01999, and 
.01999 bits respectively [ 1 1 . These results are in line with the 
informal reasoning made in sections IVIII-A3I and IVIII-B 1 1 

Unlike the metric proposed in J2], our measure has an 
intrinsic absolute range bounded by the size of a program's 
secret input as proved in Theorem \T] 

Theorem 1: Considering both deterministic and probabilistic 
programs, and all types of an attacker's beliefs, and avoiding 
the imposition of any admissibility restriction on those beliefs, 
the general range of flow reported by Q is: 

QQ = i~v, v] 

where r\ is the size of a program's secret input in bits. 

Proof: The proof is given in Appendix II-BI ■ 
Additionally, the results reported by our measure are easily 
associated with the exhaustive search effort needed to uncover 
a program's secret information. This can be easily shown by 
assuming a program with a secret input of size r\ bits, and an 
informing flow of k bits from the same program to an attacker. 
The absolute upper bound of Q, given in Theorem Q] tells us 
that k < i]. Therefore, the space of the exhaustive search lTT6l 
that should be carried out in order to reveal the residual part 
Tj — k bits of the secret input is 2''~ fc . On the contrary, our 
earlier work [7 1 showed that the exhaustive search space cannot 
be established under the metric proposed in J2- 

X. Conclusions 

We presented a generalization of the QIF analysis method 
proposed in [1], [2|. Our generalization is based on Dempster- 
Shafer theory of imprecise probabilities. We uncovered a num- 
ber of weaknesses in the original method and showed that they 
are eliminated by way of our generalization. Our generalized 
method can handle any number of secret inputs to a program, 
it enables the capturing of an attacker's beliefs in all kinds of 
sets (singleton or not), and it supports a new and precise QIF 
measure whose reported flow results are plausible in that they 
are bounded by the size of a program's secret input, and can 
be easily associated with the exhaustive search effort needed 
to uncover a program's secret information, unlike the results 
reported by the original metric. 
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Appendix I 
Algorithms and Proofs 

A. Computing Aggregate Uncertainty 

Input: A belief function Bel : P(W S ) -> [0, 1] on W s . 
Output: AU(Bel) as given in Definition QT| 

1) Find a nonempty set A e V(W S ) such that Bel(A)/\A\ 
is maximal. If more than one set exist, assume the set 
that has the highest cardinality. 

2) For any x 6 A, put p(x) = Bel(A)/\A\. 

3) For each B C W s - A, put Bel(B) = Bel(B I) A) — 
Bel(A). 

4) Put W, = W, - A 

5) If W s ± and Bel{W s ) > 0, go to step 1. 

6) If W s ^ and Bel(W s ) = 0, put p(x) = for any 
x E W s . 

7) Compute AU(Bel) = - J] p(x)\ogp(x). 

B. Proof of Theorem Q] 

< GJS{m l ,m 2 ) < log \W S \ = V (from Section|lV£2} 

-rj < GJS(m pre ,rhh) - GJS(m p0 s t ,m h ) < rj 
QQ = [~V, V] 



